AAAI.2021 - Philosophy and Ethics of AI | Cool Papers

#1 Explaining A Black-box By Using A Deep Variational Information Bottleneck Approach [PDF] [Copy] [Kimi]

Authors: Seojin Bang ; Pengtao Xie ; Heewook Lee ; Wei Wu ; Eric Xing

Interpretable machine learning has gained much attention recently. Briefness and comprehensiveness are necessary in order to provide a large amount of information concisely when explaining a black-box decision system. However, existing interpretable machine learning methods fail to consider briefness and comprehensiveness simultaneously, leading to redundant explanations. We propose the variational information bottleneck for interpretation, VIBI, a system-agnostic interpretable method that provides a brief but comprehensive explanation. VIBI adopts an information theoretic principle, information bottleneck principle, as a criterion for finding such explanations. For each instance, VIBI selects key features that are maximally compressed about an input (briefness), and informative about a decision made by a black-box system on that input (comprehensive). We evaluate VIBI on three datasets and compare with state-of-the-art interpretable machine learning methods in terms of both interpretability and fidelity evaluated by human and quantitative metrics.

#2 Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork [PDF] [Copy] [Kimi]

Authors: Gagan Bansal ; Besmira Nushi ; Ece Kamar ; Eric Horvitz ; Daniel S. Weld

AI practitioners typically strive to develop the most accurate systems, making an implicit assumption that the AI system will function autonomously. However, in practice, AI systems often are used to provide advice to people in domains ranging from criminal justice and finance to healthcare. In such AI-advised decision making, humans and machines form a team, where the human is responsible for making final decisions. But is the most accurate AI the best teammate? We argue "not necessarily" --- predictable performance may be worth a slight sacrifice in AI accuracy. Instead, we argue that AI systems should be trained in a human-centered manner, directly optimized for team performance. We study this proposal for a specific type of human-AI teaming, where the human overseer chooses to either accept the AI recommendation or solve the task themselves. To optimize the team performance for this setting we maximize the team's expected utility, expressed in terms of the quality of the final decision, cost of verifying, and individual accuracies of people and machines. Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the most accuracy AI may not lead to highest team performance and show the benefit of modeling teamwork during training through improvements in expected team utility across datasets, considering parameters such as human skill and the cost of mistakes. We discuss the shortcoming of current optimization approaches beyond well-studied loss functions such as log-loss, and encourage future work on AI optimization problems motivated by human-AI collaboration.

#3 TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments [PDF] [Copy] [Kimi]

Authors: Tom Bewley ; Jonathan Lawry

In explainable artificial intelligence, there is increasing interest in understanding the behaviour of autonomous agents to build trust and validate performance. Modern agent architectures, such as those trained by deep reinforcement learning, are currently so lacking in interpretable structure as to effectively be black boxes, but insights may still be gained from an external, behaviourist perspective. Inspired by conceptual spaces theory, we suggest that a versatile first step towards general understanding is to discretise the state space into convex regions, jointly capturing similarities over the agent's action, value function and temporal dynamics within a dataset of observations. We create such a representation using a novel variant of the CART decision tree algorithm, and demonstrate how it facilitates practical understanding of black box agents through prediction, visualisation and rule-based explanation.

#4 Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example [PDF] [Copy] [Kimi]

Authors: Serena Booth ; Yilun Zhou ; Ankit Shah ; Julie Shah

Post-hoc explanation methods are gaining popularity for interpreting, understanding, and debugging neural networks. Most analyses using such methods explain decisions in response to inputs drawn from the test set. However, the test set may have few examples that trigger some model behaviors, such as high-confidence failures or ambiguous classifications. To address these challenges, we introduce a flexible model inspection framework: Bayes-TrEx. Given a data distribution, Bayes-TrEx finds in-distribution examples which trigger a specified prediction confidence. We demonstrate several use cases of Bayes-TrEx, including revealing highly confident (mis)classifications, visualizing class boundaries via ambiguous examples, understanding novel-class extrapolation behavior, and exposing neural network overconfidence. We use Bayes-TrEx to study classifiers trained on CLEVR, MNIST, and Fashion-MNIST, and we show that this framework enables more flexible holistic model analysis than just inspecting the test set. Code and supplemental material are available at https://github.com/serenabooth/Bayes-TrEx.

#5 FIMAP: Feature Importance by Minimal Adversarial Perturbation [PDF] [Copy] [Kimi]

Authors: Matt Chapman-Rounds ; Umang Bhatt ; Erik Pazos ; Marc-Andre Schulz ; Konstantinos Georgatzis

Instance-based model-agnostic feature importance explanations (LIME, SHAP, L2X) are a popular form of algorithmic transparency. These methods generally return either a weighting or subset of input features as an explanation for the classification of an instance. An alternative literature argues instead that counterfactual instances, which alter the black-box model's classification, provide a more actionable form of explanation. We present Feature Importance by Minimal Adversarial Perturbation (FIMAP), a neural network based approach that unifies feature importance and counterfactual explanations. We show that this approach combines the two paradigms, recovering the output of feature-weighting methods in continuous feature spaces, whilst indicating the direction in which the nearest counterfactuals can be found. Our method also provides an implicit confidence estimate in its own explanations, something existing methods lack. Additionally, FIMAP improves upon the speed of sampling-based methods, such as LIME, by an order of magnitude, allowing for explanation deployment in time-critical applications. We extend our approach to categorical features using a partitioned Gumbel layer and demonstrate its efficacy on standard datasets.

#6 Beyond Class-Conditional Assumption: A Primary Attempt to Combat Instance-Dependent Label Noise [PDF] [Copy] [Kimi]

Authors: Pengfei Chen ; Junjie Ye ; Guangyong Chen ; Jingwei Zhao ; Pheng-Ann Heng

Supervised learning under label noise has seen numerous advances recently, while existing theoretical findings and empirical results broadly build up on the class-conditional noise (CCN) assumption that the noise is independent of input features given the true label. In this work, we present a theoretical hypothesis testing and prove that noise in real-world dataset is unlikely to be CCN, which confirms that label noise should depend on the instance and justifies the urgent need to go beyond the CCN assumption.The theoretical results motivate us to study the more general and practical-relevant instance-dependent noise (IDN). To stimulate the development of theory and methodology on IDN, we formalize an algorithm to generate controllable IDN and present both theoretical and empirical evidence to show that IDN is semantically meaningful and challenging. As a primary attempt to combat IDN, we present a tiny algorithm termed self-evolution average label (SEAL), which not only stands out under IDN with various noise fractions, but also improves the generalization on real-world noise benchmark Clothing1M. Our code is released. Notably, our theoretical analysis in Section 2 provides rigorous motivations for studying IDN, which is an important topic that deserves more research attention in future.

#7 Robustness of Accuracy Metric and its Inspirations in Learning with Noisy Labels [PDF] [Copy] [Kimi]

Authors: Pengfei Chen ; Junjie Ye ; Guangyong Chen ; Jingwei Zhao ; Pheng-Ann Heng

For multi-class classification under class-conditional label noise, we prove that the accuracy metric itself can be robust. We concretize this finding's inspiration in two essential aspects: training and validation, with which we address critical issues in learning with noisy labels. For training, we show that maximizing training accuracy on sufficiently many noisy samples yields an approximately optimal classifier. For validation, we prove that a noisy validation set is reliable, addressing the critical demand of model selection in scenarios like hyperparameter-tuning and early stopping. Previously, model selection using noisy validation samples has not been theoretically justified. We verify our theoretical results and additional claims with extensive experiments. We show characterizations of models trained with noisy labels, motivated by our theoretical results, and verify the utility of a noisy validation set by showing the impressive performance of a framework termed noisy best teacher and student (NTS). Our code is released.

#8 A Unified Taylor Framework for Revisiting Attribution Methods [PDF] [Copy] [Kimi]

Authors: Huiqi Deng ; Na Zou ; Mengnan Du ; Weifu Chen ; Guocan Feng ; Xia Hu

Attribution methods have been developed to understand the decision making process of machine learning models, especially deep neural networks, by assigning importance scores to individual features. Existing attribution methods often built upon empirical intuitions and heuristics. There still lacks a general and theoretical framework that not only can unify these attribution methods, but also theoretically reveal their rationales, fidelity, and limitations. To bridge the gap, in this paper, we propose a Taylor attribution framework and reformulate seven mainstream attribution methods into the framework. Based on reformulations, we analyze the attribution methods in terms of rationale, fidelity, and limitation. Moreover, We establish three principles for a good attribution in the Taylor attribution framework, i.e., low approximation error, correct contribution assignment, and unbiased baseline selection. Finally, we empirically validate the Taylor reformulations, and reveal a positive correlation between the attribution performance and the number of principles followed by the attribution method via benchmarking on real-world datasets.

#9 Verifiable Machine Ethics in Changing Contexts [PDF] [Copy] [Kimi]

Authors: Louise A. Dennis ; Martin Mose Bentzen ; Felix Lindner ; Michael Fisher

Many systems proposed for the implementation of ethical reasoning involve an encoding of user values as a set of rules or a model. We consider the question of how changes of context affect these encodings. We propose the use of a reasoning cycle, in which information about the ethical reasoner's context is imported in a logical form, and we propose that context-specific aspects of an ethical encoding be prefaced by a guard formula. This guard formula should evaluate to true when the reasoner is in the appropriate context and the relevant parts of the reasoner's rule set or model should be updated accordingly. This architecture allows techniques for the model-checking of agent-based autonomous systems to be used to verify that all contexts respect key stakeholder values. We implement this framework using the hybrid ethical reasoning agents system (HERA) and the model-checking agent programming languages (MCAPL) framework.

#10 Epistemic Logic of Know-Who [PDF] [Copy] [Kimi]

Authors: Sophia Epstein ; Pavel Naumov

The paper suggests a definition of "know who" as a modality using Grove-Halpern semantics of names. It also introduces a logical system that describes the interplay between modalities "knows who", "knows", and "for all agents". The main technical result is a completeness theorem for the proposed system.

#11 Agent Incentives: A Causal Perspective [PDF] [Copy] [Kimi]

Authors: Tom Everitt ; Ryan Carey ; Eric D. Langlois ; Pedro A. Ortega ; Shane Legg

We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundness and completeness. We also introduce two new concepts for incentive analysis: response incentives indicate which changes in the environment affect an optimal decision, while instrumental control incentives establish whether an agent can influence its utility via a variable X. For both new concepts, we provide sound and complete graphical criteria. We show by example how these results can help with evaluating the safety and fairness of an AI system

#12 Individual Fairness in Kidney Exchange Programs [PDF] [Copy] [Kimi]

Authors: Golnoosh Farnadi ; William St-Arnaud ; Behrouz Babaki ; Margarida Carvalho

Kidney transplant is the preferred method of treatment for patients suffering from kidney failure. However, not all patients can find a donor which matches their physiological characteristics. Kidney exchange programs (KEPs) seek to match such incompatible patient-donor pairs together, usually with the main objective of maximizing the total number of transplants. Since selecting one optimal solution translates to a decision on who receives a transplant, it has a major effect on the lives of patients. The current practice in selecting an optimal solution does not necessarily ensure fairness in the selection process. In this paper, the existence of multiple optimal plans for a KEP is explored as a mean to achieve individual fairness. We propose the use of randomized policies for selecting an optimal solution in which patients' equal opportunity to receive a transplant is promoted. Our approach gives rise to the problem of enumerating all optimal solutions, which we tackle using a hybrid of constraint programming and linear programming. The advantages of our proposed method over the common practice of using the optimal solution obtained by a solver are stressed through computational experiments. Our methodology enables decision makers to fully control KEP outcomes, overcoming any potential bias or vulnerability intrinsic to a deterministic solver.

#13 Fair Representations by Compression [PDF] [Copy] [Kimi]

Authors: Xavier Gitiaux ; Huzefa Rangwala

Organizations that collect and sell data face increasing scrutiny for the discriminatory use of data. We propose a novel unsupervised approach to map data into a compressed binary representation independent of sensitive attributes. We show that in an information bottleneck framework, a parsimonious representation should filter out information related to sensitive attributes if they are provided directly to the decoder. Empirical results show that the method achieves state-of-the-art accuracy-fairness trade-off and that explicit control of the entropy of the representation bit stream allows the user to move smoothly and simultaneously along both rate-distortion and rate-fairness curves.

#14 Amnesiac Machine Learning [PDF] [Copy] [Kimi]

Authors: Laura Graves ; Vineel Nagisetty ; Vijay Ganesh

The Right to be Forgotten is part of the recently enacted General Data Protection Regulation (GDPR) law that affects any data holder that has data on European Union residents. It gives EU residents the ability to request deletion of their personal data, including training records used to train machine learning models. Unfortunately, Deep Neural Network models are vulnerable to information leaking attacks such as model inversion attacks which extract class information from a trained model and membership inference attacks which determine the presence of an example in a model's training data. If a malicious party can mount an attack and learn private information that was meant to be removed, then it implies that the model owner has not properly protected their user's rights and their models may not be compliant with the GDPR law. In this paper, we present two efficient methods that address this question of how a model owner or data holder may delete personal data from models in such a way that they may not be vulnerable to model inversion and membership inference attacks while maintaining model efficacy. We start by presenting a real-world threat model that shows that simply removing training data is insufficient to protect users. We follow that up with two data removal methods, namely Unlearning and Amnesiac Unlearning, that enable model owners to protect themselves against such attacks while being compliant with regulations. We provide extensive empirical analysis that show that these methods are indeed efficient, safe to apply, effectively remove learned information about sensitive data from trained models while maintaining model efficacy.

#15 On the Verification of Neural ODEs with Stochastic Guarantees [PDF] [Copy] [Kimi]

Authors: Sophie Grunbacher ; Ramin Hasani ; Mathias Lechner ; Jacek Cyranka ; Scott A. Smolka ; Radu Grosu

We show that Neural ODEs, an emerging class of time-continuous neural networks, can be verified by solving a set of global-optimization problems. For this purpose, we introduce Stochastic Lagrangian Reachability (SLR), an abstraction-based technique for constructing a tight Reachtube (an over-approximation of the set of reachable states over a given time-horizon), and provide stochastic guarantees in the form of confidence intervals for the Reachtube bounds. SLR inherently avoids the infamous wrapping effect (accumulation of over-approximation errors) by performing local optimization steps to expand safe regions instead of repeatedly forward-propagating them as is done by deterministic reachability methods. To enable fast local optimizations, we introduce a novel forward-mode adjoint sensitivity method to compute gradients without the need for backpropagation. Finally, we establish asymptotic and non-asymptotic convergence rates for SLR.

#16 PenDer: Incorporating Shape Constraints via Penalized Derivatives [PDF] [Copy] [Kimi]

Authors: Akhil Gupta ; Lavanya Marla ; Ruoyu Sun ; Naman Shukla ; Arinbjörn Kolbeinsson

When deploying machine learning models in the real-world, system designers may wish that models exhibit certain shape behavior, i.e., model outputs follow a particular shape with respect to input features. Trends such as monotonicity, convexity, diminishing or accelerating returns are some of the desired shapes. Presence of these shapes makes the model more interpretable for the system designers, and adequately fair for the customers. We notice that many such common shapes are related to derivatives, and propose a new approach, PenDer (Penalizing Derivatives), which incorporates these shape constraints by penalizing the derivatives. We further present an Augmented Lagrangian Method (ALM) to solve this constrained optimization problem. Experiments on three real-world datasets illustrate that even though both PenDer and state-of-the-art Lattice models achieve similar conformance to shape, PenDer captures better sensitivity of prediction with respect to intended features. We also demonstrate that PenDer achieves better test performance than Lattice while enforcing more desirable shape behavior.

#17 Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization [PDF] [Copy] [Kimi]

Authors: Shir Gur ; Ameen Ali ; Lior Wolf

Neural network visualization techniques mark image locations by their relevancy to the network's classification. Existing methods are effective in highlighting the regions that affect the resulting classification the most. However, as we show, these methods are limited in their ability to identify the support for alternative classifications, an effect we name the saliency bias hypothesis. In this work, we integrate two lines of research: gradient-based methods and attribution-based methods, and develop an algorithm that provides per-class explainability. The algorithm back-projects the per pixel local influence, in a manner that is guided by the local attributions, while correcting for salient features that would otherwise bias the explanation. In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization, and not just the predicted label. Remarkably, the method obtains state of the art results in benchmarks that are commonly applied to gradient-based methods as well as in those that are employed mostly for evaluating attribution methods. Using a new unsupervised procedure, our method is also successful in demonstrating that self-supervised methods learn semantic information. Our code is available at: https://github.com/shirgur/AGFVisualization.

#18 Differentially Private Clustering via Maximum Coverage [PDF] [Copy] [Kimi]

Authors: Matthew Jones ; Huy L. Nguyen ; Thy D Nguyen

This paper studies the problem of clustering in metric spaces while preserving the privacy of individual data. Specifically, we examine differentially private variants of the k-medians and Euclidean k-means problems. We present polynomial algorithms with constant multiplicative error and lower additive error than the previous state-of-the-art for each problem. Additionally, our algorithms use a clustering algorithm without differential privacy as a black-box. This allows practitioners to control the trade-off between runtime and approximation factor by choosing a suitable clustering algorithm to use.

#19 Ordered Counterfactual Explanation by Mixed-Integer Linear Optimization [PDF] [Copy] [Kimi]

Authors: Kentaro Kanamori ; Takuya Takagi ; Ken Kobayashi ; Yuichi Ike ; Kento Uemura ; Hiroki Arimura

Post-hoc explanation methods for machine learning models have been widely used to support decision-making. One of the popular methods is Counterfactual Explanation (CE), also known as Actionable Recourse, which provides a user with a perturbation vector of features that alters the prediction result. Given a perturbation vector, a user can interpret it as an "action" for obtaining one's desired decision result. In practice, however, showing only a perturbation vector is often insufficient for users to execute the action. The reason is that if there is an asymmetric interaction among features, such as causality, the total cost of the action is expected to depend on the order of changing features. Therefore, practical CE methods are required to provide an appropriate order of changing features in addition to a perturbation vector. For this purpose, we propose a new framework called Ordered Counterfactual Explanation (OrdCE). We introduce a new objective function that evaluates a pair of an action and an order based on feature interaction. To extract an optimal pair, we propose a mixed-integer linear optimization approach with our objective function. Numerical experiments on real datasets demonstrated the effectiveness of our OrdCE in comparison with unordered CE methods.

#20 On Generating Plausible Counterfactual and Semi-Factual Explanations for Deep Learning [PDF] [Copy] [Kimi]

Authors: Eoin M. Kenny ; Mark T Keane

There is a growing concern that the recent progress made in AI, especially regarding the predictive competence of deep learning models, will be undermined by a failure to properly explain their operation and outputs. In response to this disquiet, counterfactual explanations have become very popular in eXplainable AI (XAI) due to their asserted computational, psychological, and legal benefits. In contrast however, semi-factuals (which appear to be equally useful) have surprisingly received no attention. Most counterfactual methods address tabular rather than image data, partly because the non-discrete nature of images makes good counterfactuals difficult to define; indeed, generating plausible counterfactual images which lie on the data manifold is also problematic. This paper advances a novel method for generating plausible counterfactuals and semi-factuals for black-box CNN classifiers doing computer vision. The present method, called PlausIble Exceptionality-based Contrastive Explanations (PIECE), modifies all “exceptional” features in a test image to be “normal” from the perspective of the counterfactual class, to generate plausible counterfactual images. Two controlled experiments compare this method to others in the literature, showing that PIECE generates highly plausible counterfactuals (and the best semi-factuals) on several benchmark measures.

#21 How RL Agents Behave When Their Actions Are Modified [PDF] [Copy] [Kimi]

Authors: Eric D. Langlois ; Tom Everitt

Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions. As a result of supervisor intervention, the executed action may differ from the action specified by the policy. How does this affect learning? We present the Modified-Action Markov Decision Process, an extension of the MDP model that allows actions to differ from the policy. We analyze the asymptotic behaviours of common reinforcement learning algorithms in this setting and show that they adapt in different ways: some completely ignore modifications while others go to various lengths in trying to avoid action modifications that decrease reward. By choosing the right algorithm, developers can prevent their agents from learning to circumvent interruptions or constraints, and better control agent responses to other kinds of action modification, like self-damage.

#22 Outlier Impact Characterization for Time Series Data [PDF] [Copy] [Kimi]

Authors: Jianbo Li ; Lecheng Zheng ; Yada Zhu ; Jingrui He

For time series data, certain types of outliers are intrinsically more harmful for parameter estimation and future predictions than others, irrespective of their frequency. In this paper, for the first time, we study the characteristics of such outliers through the lens of the influence functional from robust statistics. In particular, we consider the input time series as a contaminated process, with the recurring outliers generated from an unknown contaminating process. Then we leverage the influence functional to understand the impact of the contaminating process on parameter estimation. The influence functional results in a multi-dimensional vector that measures the sensitivity of the predictive model to the contaminating process, which can be challenging to interpret especially for models with a large number of parameters. To this end, we further propose a comprehensive single-valued metric (the SIF) to measure outlier impacts on future predictions. It provides a quantitative measure regarding the outlier impacts, which can be used in a variety of scenarios, such as the evaluation of outlier detection methods, the creation of more harmful outliers, etc. The empirical results on multiple real data sets demonstrate the effectivenss of the proposed SIF metric.

#23 Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations [PDF] [Copy] [Kimi]

Authors: Woo-Jeoung Nam ; Jaesik Choi ; Seong-Whan Lee

The clear transparency of Deep Neural Networks (DNNs) is hampered by complex internal structures and nonlinear transformations along deep hierarchies. In this paper, we propose a new attribution method, Relative Sectional Propagation (RSP), for fully decomposing the output predictions with the characteristics of class-discriminative attributions and clear objectness. We carefully revisit some shortcomings of backpropagation-based attribution methods, which are trade-off relations in decomposing DNNs. We define hostile factor as an element that interferes with finding the attributions of the target and propagate it in a distinguishable way to overcome the non-suppressed nature of activated neurons. As a result, it is possible to assign the bi-polar relevance scores of the target (positive) and hostile (negative) attributions while maintaining each attribution aligned with the importance. We also present the purging techniques to prevent the decrement of the gap between the relevance scores of the target and hostile attributions during backward propagation by eliminating the conflicting units to channel attribution map. Therefore, our method makes it possible to decompose the predictions of DNNs with clearer class-discriminativeness and detailed elucidations of activation neurons compared to the conventional attribution methods. In a verified experimental environment, we report the results of the assessments: (i) Pointing Game, (ii) mIoU, and (iii) Model Sensitivity with PASCAL VOC 2007, MS COCO 2014, and ImageNet datasets. The results demonstrate that our method outperforms existing backward decomposition methods, including distinctive and intuitive visualizations.

#24 Ethical Dilemmas in Strategic Games [PDF] [Copy] [Kimi]

Authors: Pavel Naumov ; Rui-Jie Yew

An agent, or a coalition of agents, faces an ethical dilemma between several statements if she is forced to make a conscious choice between which of these statements will be true. This paper proposes to capture ethical dilemmas as a modality in strategic game settings with and without limit on sacrifice and for perfect and imperfect information games. The authors show that the dilemma modality cannot be defined through the earlier proposed blameworthiness modality. The main technical result is a sound and complete axiomatization of the properties of this modality with sacrifice in games with perfect information.

#25 Comprehension and Knowledge [PDF] [Copy] [Kimi]

Authors: Pavel Naumov ; Kevin Ros

The ability of an agent to comprehend a sentence is tightly connected to the agent's prior experiences and background knowledge. The paper suggests to interpret comprehension as a modality and proposes a complete bimodal logical system that describes an interplay between comprehension and knowledge modalities.